Similarity Search in Metric Spaces

نویسنده

  • Yuhui Wen
چکیده

Similarity search refers to any searching problem which retrieves objects from a set that are close to a given query object as re ected by some similarity criterion. It has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. In this thesis, we examine algorithms designed for similarity search over arbitrary metric spaces rather than restricting ourselves to vector spaces. The contributions in this paper include the following: First, after de ning pivot sharing and pivot localization, we prove probabilistically that pivot sharing level should be increased for scattered data while pivot localization level should be increased for clustered data. This conclusion is supported by extensive experiments. Moreover, we proposed two new algorithms, RLAESA and NGH-tree. RLAESA, using high pivot sharing level and low pivot localization level, outperforms the fastest algorithm in the same category, MVP-tree. NGH-tree is used as a framework to show the e ect of increasing pivot sharing level on search e ciency. It provides a way to improve the search e ciency in almost all algorithms. The experiments with RLAESA and NGH-tree not only show their preformance, but also support the rst conclusion we mentioned above. Second, we analyzed the issue of disk I/O on similarity search and proposed a new algorithm SLAESA to improve the search e ciency by switching random I/O access to sequential I/O access.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Content-Addressable Network for Similarity Search in Metric Spaces

Because of the ongoing digital data explosion, more advanced search paradigms than the traditional exact match are needed for contentbased retrieval in huge and ever growing collections of data produced in application areas such as multimedia, molecular biology, marketing, computer-aided design and purchasing assistance. As the variety of data types is fast going towards creating a database uti...

متن کامل

Access Structures for Advanced Similarity Search in Metric Spaces

Similarity retrieval is an important paradigm for searching in environments where exact match has little meaning. Moreover, in order to enlarge the set of data types for which the similarity search can efficiently be performed, the notion of mathematical metric space provides a useful abstraction for similarity. In this paper we consider the problem of organizing and searching large data-sets f...

متن کامل

New Approaches to Similarity Searching in Metric Spaces

Title of dissertation: NEW APPROACHES TO SIMILARITY SEARCHING IN METRIC SPACES Cengiz Celik, Doctor of Philosophy, 2006 Dissertation directed by: Professor David Mount Department of Computer Science The complex and unstructured nature of many types of data, such as multimedia objects, text documents, protein sequences, requires the use of similarity search techniques for retrieval of informatio...

متن کامل

Aspects of Metric Spaces in Computation

Metric spaces, which generalise the properties of commonly-encountered physical and abstract spaces into a mathematical framework, frequently occur in computer science applications. Three major kinds of questions about metric spaces are considered here: the intrinsic dimensionality of a distribution, the maximum number of distance permutations, and the difficulty of reverse similarity search. I...

متن کامل

Spatial Selection of Sparse Pivots for Similarity Search in Metric Spaces

Similarity search is a fundamental operation for applications that deal with unstructured data sources. In this paper we propose a new pivot-based method for similarity search, called Sparse Spatial Selection (SSS). The main characteristic of this method is that it guarantees a good pivot selection more efficiently than other methods previously proposed. In addition, SSS adapts itself to the di...

متن کامل

Similarity Measures for Relational Databases

We enrich sets with an integrated notion of similarity, measured in a (complete) lattice, special cases of which are reflexive sets and bounded metric spaces. Relations and basic relational operations of traditional relational algebra are interpreted in such richer structured environments. An canonical similarity measure between relations is introduced. In the special case of reflexive sets it ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004